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Abstract. This paper develops a connection between the asymptotic stability 
of nonlinear filters and a notion of observability. We consider a general class of 
hidden Markov models in continuous time with compact signal state space, and 
call such a model observable if no two initial measures of the signal process 
give rise to the same law of the observation process. We demonstrate that 
observability implies stability of the filter, i.e., the filtered estimates become 
insensitive to the initial measure at large times. For the special case where 
the signal is a finite-state Markov process and the observations are of the 
white noise type, a complete (necessary and sufficient) characterization of filter 
stability is obtained in terms of a slightly weaker detectability condition. In 
addition to observability, the role of controllability in filter stability is explored. 
Finally, the results are partially extended to non-compact signal state spaces. 



1. Introduction 
Consider the deterministic linear control system 

— x(t) = AxCt) +Y,u(t), 
at 

y(t) = Cx(t), 

where x(t) is the system state, u(t) is the control input and y(t) is the observation 
signal. Such a system is called observable if there exist no x ^ x' such that {y(t) : 
t > 0} is the same when x(0) = x and x(0) = x' (for any control u(t)), and is called 
controllable if for any x,x' and t > 0, there is a control signal u(t) such that the 
solution with x(0) = x satisfies x(t) — x' . It is well known that observability 

and controllability arc intimately related with the asymptotic properties of the 
conditional estimates in the linear filtering problem 

dX t = AX t dt + HdWt, 

dY t = CX t dt + dB t , 

where Wt and B t are independent standard Wiener processes. In particular, it 
is found that the filtered estimates become insensitive to the law of X$ at large 
times, i.e.Q \W L {f{X t )\^) - W{f{X t )\&?)\ -> as t -> oo for any pair of initial 
laws X ~ /i, u, whenever the associated linear control system is observable and 
controllable [55] . This is called the stability property of the filtering problem, and is 
of significant importance from the practical point of view as it ensures the robustness 
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1 Here P^ 1 is the law under which Xq ~ /i, = c{¥s : s < t}, and the versions of the 

conditional expectations are chosen to coincide with those computed by the Kalman filter. 
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of the filtered estimates with respect to modelling errors and approximations. The 
purpose of this paper is to demonstrate that the connection between observability, 
contollability, and the linear filtering problem has a natural counterpart in a large 
variety of nonlinear filtering problems for Markovian signal-observation models. 

Wc consider a signal process X t and observation process Y t in continuous time, 
where both X t and (X t , Y t ) are assumed to be Markov processes and Y t is assumed 
to satisfy a mild condition which ensures, in essence, that the observation noise 
is memoryless (the most common observation models in continuous time, additive 
white noise observations and counting observations, both satisfy this requirement). 
In addition, we will mostly assume that the signal process takes values in a compact 
state space. The significance of the compactness assumption and some extensions 
of the results to the non-compact case are discussed in section [H 

In this general setting, the model is called observable if there do not exist initial 
measures fi ^ v such that {Yt : t > 0} has the same law when Xq ~ [i and 
Xq ~ v. One of the main results of this paper fcorollary |4.6p is that if the model is 
observable, then |E^(/(X t )|^ y ) - E u (f(X t )\J?f)\ -> as t -> oo P^-a.s. for any 
bounded continuous / and any pair of absolutely continuous initial laws fi <C v. 

On the other hand, the notion of controllability is replaced by a certain reg- 
ularity property of the signal transition probabilities (see definition 17.80 . Under 
the additional assumption that the observations are of the white noise type and 
nondegenerate, we show that observability and regularity of the signal imply that 
\W{f(X t )\&]f) - W{f(X t )\,^Y)\ -> as t -> oo P^-a.s. for any bounded con- 
tinuous / and initial laws fx, v (corollary 17. lip , without the absolute continuity 
requirement. When the signal is the solution of a stochastic differential equation, 
then the regularity property is directly related to the controllability of an asso- 
ciated deterministic control system. This result is thus entirely parallel to the 
observability-controllability criterion for the stability of the Kalman filter. 

A natural test case for the general theory is the setting where the signal state 
space is a finite set. The simplicity of this setting allows a particularly transparent 
insight into the nature of the observability and controllability properties, while this 
setting is also of particular practical importance due to the fact that the associated 
nonlinear filters are finite dimensionally computable. By combining the general 
results in this paper with known filter stability results for ergodic signals [4], we 
find that a complete characterization of filter stability is possible for finite state 
signals with nondegenerate white noise type observations. In particular, we will 
find necessary and sufficient criteria for stability (theorems 16.121 and 17.20 which 
can be verified explicitly for a given model through straightforward linear algebra 
techniques. The fact that such a complete characterization is possible, albeit in a 
particularly simple case, suggests that the notion of observability used in this paper 
is in some sense a fundamental ingredient of the filter stability problem. 

The stability of nonlinear filters has been studied actively in the last few years 
following the pioneering contributions of Ocone and Pardoux [26j and Zeitouni et al. 
[151 H]. An excellent overview of previous work and an extensive list of references 
can be found in [7]. The majority of the results on this topic assume that the 
signal is an ergodic Markov process and that the observation process is of the white 
noise type. Such results are complementary to the results obtained in this paper; 
indeed, for the complete characterization of stability in the finite state setting, it 
is essential to combine our results with results that are specific to ergodic signals. 
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Moreover, ergodic results may fail to hold true for certain degenerate observation 
models (see the counterexample in [4]), while the results of this paper still hold in 
that setting. On the other hand, much is known about the rate of convergence of 
differently initialized filters in particular cases (see, e.g., [131 HI H] ) , while our results 
do not provide such information. In a much more general setting, some results on 
the asymptotic properties of nonlinear filtering errors can be found in Clark, Ocone 
and Coumarbatch [9]. Their point of view is close to the one used in this paper, 
but their results do not establish stability of the filter. 

The approach in this paper is inspired by the observation of Chigansky and 
Liptser [8] that, by virtue of a martingale convergence argument, certain predictive 
estimates of the observations are always stable. This implies, in particular, that the 
filtered estimates of the signal are stable for a particular class of functions, which 
can be written as the conditional mean of a functional of the observation process 
given the initial value of the signal (see section 0]). The heart of the argument that 
leads to the observability criterion is the characterization of this class of functions 
(proposition [32]), which is achieved using a corollary of the Hahn-Banach theorem. 

The remainder of this paper is set up as follows. In section [2] we introduce the 
general signal-observation model that will be used in most of the paper, and fix the 
notation for the remainder of the paper. Section[3]is devoted to the study of the no- 
tion of observability and its connection to the class of functions that can be obtained 
as predictive estimates of the observations. Section [4] connects these concepts to the 
stability of the filter. In section we show how the notion of observability can be 
characterized in the common cases of white noise type and counting observations, 
and we find a particularly simple sufficient criterion for observability in those cases. 
Section [5] is devoted to the finite state case; an explicit criterion is found for ob- 
servability, and stability is completely characterized for absolutely continuous initial 
measures. Section [7] explores the connection between controllability, regularity, and 
the stability of filters for arbitrary initial conditions; a complete characterization 
is again given for the finite state case. Finally, section [8] discusses the significance 
of the compactness assumption made in the previous sections, and provides some 
partial extensions of previous results. A simple but apparently unknown result for 
the Kalman filter is briefly discussed in the appendix. 

Before proceding to the main part of the paper, we make a few general remarks. 

Remark 1.1. The main results of this paper can be adapted in a straightforward 
fashion to the discrete time setting. 

Remark 1.2. It is not difficult to show that when our general notions of observability 
and controllability are applied to the linear filtering model, one obtains precisely the 
classical observability-controllability criteria for the Kalman filter. Unfortunately, 
the results in section [8] for the non-compact case are not sufficiently powerful to 
recover the stability of the Kalman filter, except then the signal itself is asymp- 
totically stable (i.e., the matrix A has only eigenvalues with strictly negative real 
parts). The latter case is not particularly interesting for the Kalman filter, as the 
more general dctectability criterion (see [26] and the appendix) makes observability 
irrelevant in this setting. The fact that the Kalman filter with an unstable signal 
is not covered is a major shortcoming of the results in this paper. 

Remark 1.3. With the exception of the Kalman filter, the finite state case, and 
observations with an invertible observation function (lemma f5T6j) . the observability 



4 



RAMON VAN HANDEL 



property appears to be difficult to verify for a given model. For practical appli- 
cations, it is thus necessary to develop explicitly verifiable sufficient criteria for 
observability (see section [7751 for further discussion). 



The goal of this section is to set up the model for the signal and observation 
processes, and to fix the notation that will be used in the following. 

Let us begin by introducing the basic objects that make up the model. 

(1) The signal state space § is a compact Polish space. 

(2) The observation state space O = W J for some p < oo. 

(3) The signal-observation process (X t , ^t)tG[o,oo[ is a time-homogeneous SxO- 
valucd Feller-Markov process with cadlag paths. 

(4) The signal process (^"t)te[o.oo[ is a Feller-Markov process in its own right. 

(5) The observation process {Yt) t ^[o^oo[ has conditionally independent incre- 
ments given the signal process (^t)te[o,oo[i an( l Yq = 0. 

This can be viewed as a hidden Markov model in continuous time, where Y t is the 
observable component and X t is the nonobservable component. 

For any locally compact Polish space S, we denote by &(S) the Borel cr-algebra, 
by C(S) the space of continuous functions, by Cb(S) the space of bounded continuous 
functions, by Co (S) the space of continuous functions that vanish at infinity, by 
M(S) the space of finite signed measures on &(S), by V{S) the space of probability 
measures on &(S), and by M C (S) (V C (S)) the finite signed (probability) measures 
with compact support. Note that when S is compact, C(S) = Cb(S) — Cq(S). 

It is convenient to construct the signal-observation process (X t , Yt)[o i00 [ on its 
canonical probability space. To this end, define tt x = D([0, oo[;§) and Q Y = 
D([0, oo[; O), i.e., Q x and Sl r are the spaces of §-valued and O-valued cadlag 
paths, endowed with the Skorokhod topology. We will work on the probability 
space 51 = A x O r , equipped with its Borel cr-algebra J? = M{Vl x x fi Y ), and 
choose X t : £1 — > § and Y t : £1 — > O to be the canonical processes X t (x, y) = x(t) 
and Y t (x,y) — y(t). Furthermore, we define the natural filtrations 



,^ x = a{X s :s<t}, = a{Y s :s<t}, & t = a{(X s , Y.) : s < t}, 



We will denote ,^ x = & x = \J ' t>Q ,^ x , and we define ,9 Y and ^ Y similarly. 

Let T t : Co(SxO) -> C (§xO) _ and P t (x,y,A) (t G [0, oo[, A <E ^(SxO)) be the 
Markov semigroup of the signal-observation process and the associated transition 
probabilities. By the Feller assumption, we can construct a process with cadlag 
paths which possesses the desired transition probabilities [HI theorem 17.15]. Hence 
there exists a family of probability measures {Pt X y\ : {x,y) £§x O} on (fi, j£") 
such that for every (x,y), the process (X t ,Y t ) is a Markov process with respect to 
the filtration under P( Kja ) with transition probabilities Pt(x,y, A) and initial 
law (Xo,Y<j) ~ Sj( Xj y\\ 7 and {x,y) i— > Pt XiV \(A) is measurable for every A 6 In 
particular, under the probability measure 



2. The signal-observation model 



and the filtration generated by the observation increments 

Sf t y = a{Y s ~Y Q :s<t}. 




4eJ, fieV{Sx O), 
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(Xt, Yt) is a Markov process with respect to the filtration &t with transition proba- 
bilities Pt(x, y, A) and initial law (Xq, Yq) ~ /x. We recall that the Markov property 
can be expressed as follows proposition III. 1.7]: for bounded ^"-measurable £ 

E /1 (£o^|^)=E (Xt> y t) (£) P M -a.s. foralH>0, 

where E^, E( x ,v) denote the expectations with respect to the measures P M and 
P( X)3/ ), and : ri — ^ O is the canonical shift 6 t (x, y)(s) = (x(s + t), y(s + t)). 

It is convenient, without loss of generality, to replace the various <r- algebras and 
filtrations defined above by their usual augmentations with respect to the family 
{P M : \i 6 V(S x O)} section 1.4], and we will make this replacement from this 
point onwards. A significant advantage of this choice is that if a bounded process 
Z t has cadlag paths, and the filtration Sf t satisfies the usual conditions, then we 
can choose a version of E(Zt|S%), for every time t, so that the process 1 i— ► E(Zt|3%) 
has cadlag paths [121 chapter VI, theorem 47], [28l theorem 6]. In the following, 
whenever such processes are encountered, their cadlag versions are always implied. 

Finally, let us make precise the conditions on the signal and observations, i.e., 
that the signal is a Markov process in its own right and that the observation process 
has conditionally independent increments given the signal process. Both these 
properties can be simultaneously introduced through the following requirement. 

• The signal is a Markov process in its own right, and the observation pro- 
cess has conditionally independent increments given the signal process, in 
the following sense: if the random variable £ is bounded and ^ x V 5f y - 
measurable, then the map (x,y) i— ► E/ x . y )(£) does not depend on y. 

Using the Markov property of (X t ,Y t ), this implies that E M (£|^" S ) = E M (£|X S ) 
P M -a.s. whenever £ is o~{X t : t > s}-measurablc, which establishes that X t is an 
^-Markov process as desired. On the other hand, we find that for any bounded, 
cr{Yt-\- s — Y s : t > 0}-mcasurable random variable £, there exists a measurable 
function / : § — > R such that E M (£|J£",,) = f(X s ) P M -a.s. for any initial measure \i 
(by the Markov property). This expresses the fact that the additional randomness 
introduced by the observation process is mcmorylcss. As we will see, the two most 
common types of observations encountered in continuous time problems, white noise 
type observations and counting observations, satisfy this property. 

It remains to note that the assumption Yq = means that we will be interested 
in initial measures of the form /i x <5{ } £ P(S X O), where /i <G V(S). We therefore 
introduce the following notation: for any /i G V(S), we define = P^xa^}- 
Similarly, E M denotes the expectation with respect to P M . 

Remark 2.1. There is no loss of generality in assuming that Yq = 0. Indeed, consider 
an arbitrary initial measure fj, € V(§> x ©). Then by [H lemma 2.4] 

E M (/(A t )|^ t y ) = E M( .| yo)X(5{nj) (/(A t )|^ t y ), 

where fx( ■ \Yo) is a regular conditional probability of Xq with respect to Yq under 
/i. But note that under any initial measure of the form v x 5^ a y, our assumptions 
imply that V Sf F is independent of J^q , so that 

E M (/(A t )|^ t y ) = E Al( .| yo)x ^ o} (/(A t )|0=E^-l^(/(A t )|^ t y ) 

provided that we choose an appropriate version of the latter conditional expectation 
that is defined P^-a.s. Thus it suffices to consider the case Yq = 0. 
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3. Spaces of observable functions and nonobservable measures 

Broadly speaking, the goal of this section is to investigate the following question: 
what is the relation between the law of Xq and the law of J? y ? In the next section, 
we will see that this question has immediate consequences for filter stability. 

Definition 3.1. For n, v £ P(S), we write /j, ^ v whenever P M |^>- = V v \ ^y . In 
particular, ^ defines an equivalence relation on 'P(S). 

In words, if (j, ^ i>, then whenever Xq has the law \x or v, the same law of the 
observation process if obtained. In particular, no amount of statistics gathered from 
the observation process will allow us to distinguish between Xq ~ fi and Xq ~ v. 
This motivates the following notion of observability, which is reminiscent (at least 
in spirit) of the notion of observability used in linear systems theory. 

Definition 3.2. The filtering model is called observable if /i ^ v implies /i = v. 

The following definition is key (we use the notation fi(f) = J f(x) (j,(dx)). 

Definition 3.3. Define the space of nonobservable measures Af as 

M = {a/ii - a\i 2 £ M(S) : a E R, S V(&), [i\ ^ ^2}- 

Moreover, we define the space of observable functions O as 

O = {/ € C 6 (S) : Mi(/) = for all ^ - M2 }. 

We attach to the nonobservable space N the following intuitive interpretation: 
if we perturb the initial measure Xq ~ /i in the direction 6 G AT (fJ, 1— > /i + (5, 
provided /i + J is again a probability measure), then the law of the observation 
process does not change. The observable space O then consists of those functions 
/ such that the expectation of f{Xo) is completely determined by the law of the 
observation process. Note that the filtering model is observable if and only if every 
continuous function is observable, i.e., O = C;,(S), or, equivalently, if no nontrivial 
signed measure is nonobservable, i.e., TV = {0}. 

Our goal is to characterize the space O. Before we do this, let us recall a simple 
functional analytic device which will be needed below [SUl chapter 4]. Let B be a 
Banach space and denote by B* its topological dual. Consider two (not necessarily 
closed) linear subspaces M C B and N C B* . 

Definition 3.4. The annihilator M 1 - C B* of M is defined as 

M x = {x* e B* : (x*,x) = for all x E M}. 

Similarly, the annihilator N 1 - c B of N is defined as 

N 1 - = {xeB : (x*,x) = for all x* E N}. 

The proof of the following lemma [30l theorem 4.7] follows from a straightforward 
application of the Hahn-Banach theorem. 

Lemma 3.5. (M ) = M, where M is the (norm-) closure of M in B. 

Recall that .M(S) is the topological dual of Cf,(S) by the Riesz-Markov theorem. 
It is thus easily verified from the definitions that O = Af ■ What we will show 
is that there is a dense subset O C O such that every f E O can be written as 
f(x) = E(2, )2/ )(£) for some bounded & Y -measurable random variable £. 
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Proposition 3.6. Let O be the linear span of functions of the form 

V (x , v) (fi(Y tl - Y Q )f 2 (Y t2 - Y ) ■ ■ ■ f n (Y tn - Y )), 

for all n < oo, U £ D and bounded continuous functions ft on O, where D is a 
dense subset of [0, oo[. Then O is dense in O. In particular, for any observable 
function f G O, there is a sequence of functions f n G O such that \\f — f n \\ — > 0. 

Proof. By our assumptions, any / G O only depends on S, and we find 

E (l , v) (/i(y tl - Y )f 2 (Y t2 - Y„) • • ■ f n (Y tn - Y )) = 

E(a;,0) (/i (^f i - Y )f 2 (Y t2 - Y ) ■ ■ • fn(Y tn - Y )) = 

Em)(/iWi)A(^) •••/«(>*„))• 

We claim that O C Cb(§). As S is compact, it suffices to show that any / £ O is 
continuous. But if x n — > a; in S, then by [THl theorem 17.25] the measures P( XrM o) 
converge weakly to ), and this in turn implies weak convergence of the finite 
dimensional distributions on some dense subset of times D [TJ] theorem 3.7.8]. The 
continuity of / G O a follows directly from the previous expression. 

To show that O is dense in O, it suffices to show that (O ) 1 - = Af by lemma 
13.51 Note that by [5j theorem 16.6] the finite dimensional distributions in a dense 
set of times form a separating class for probability measures on D([Q, oof; ©). 
Hence a standard monotone class argument shows that /ii ^ fi 2 if and only if 
E^(/i(Y 1 )/ 2 (Y f2 )-../ Il (YJ) = E^(/ 1 (Y tl )/ 2 (Y t2 )---/„(Y„)) for all finite sets 
of times ti G D and bounded continuous fa. But using the previous equation dis- 
play, this is clearly the case if and only if t L i{f) = M2(/) for all / G O . Hence we 
find immediately that Af C (O ) . On the other hand, choose any /x G (O ) 1 - (with 
\x ^ 0), and define \i\ = fi + /a, \x 2 = t l ~ I a with a = /i + (S>) (here [i = n + — /.t - is the 
Halm decomposition of fj,). Note that /ii and fi 2 are both probability measures (due 
to the fact that 1 G O implies fi(S) = 0), and \i = ck/xi — afjb 2 - But fii(f) = l^2{f) 
for all / G O implies [i\ ^ /i 2 , so evidently [i G Af. Hence we have established the 
converse inclusion Af D (O )" 1 , and the proof is complete. □ 

Remark 3.7. One might hope that any observable function / G O can be written 
as f{x) = E( XiJ ,)(£) for some bounded J# y -measurablc £. This seemingly plausible 
conjecture need not hold true, however, as the following simplified example illus- 
trates. Let X be a [0, l]-valucd random variable with law /i, and let Y = X + £ 
where £ is Gaussian with zero mean and unit variance. Denote by P^ the joint law 
of X and Y. Then the same argument used in the previous proof shows that any 
continuous function / : [0, 1] — > R can be written as the uniform limit of functions 
of the form f n (x) = E<5, , (g n (Y)). However, any /„ will necessarily be a smooth 
function (being the convolution of the bounded function g n with the Gaussian den- 
sity), so that evidently not all / can be expressed in this form. Thus in general, an 
approximation result is the best one could hope for. 

4. Filter stability and observability 

We now connect the notions of observability introduced in the previous section 
to the stability of the nonlinear filter. Recall that we are interested in determining, 
given a pair of initial measures fj,,v, whether E AI (/(X t )| t ^ t y ) and E l/ (/(X t )|^ Y ) 
are close to each other for large times t. We will see that this is always the case 
when the function / is observable, i.e., when / G O, provided that /i <C v. 
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The following lemma, which is inspired by a result of Chigansky and Liptser [8j 
theorem 2.1], contains the essence of the convergence argument. 

Lemma 4.1. Let fi, v G V(S) satisfy /i <C v. Moreover, let (£t)tg[o,oo[ ^ e 
measurable random variables with |£ t | < K < oo for all t, such that the sample 
paths 1 i— > £ t are cadlag. Then we have 

i- 



E"(W) 



P^-a.s. 



Remark 4.2. Recall that whenever conditional expectations are encountered, the 
corresponding cadlag versions are implied. Throughout the following proofs, we 
will use the usual properties of conditional expectations to obtain equalities and 
inequalities that, for every time t, hold for all u> £ £l\Nt where Nt is a P M -null set. 
Because all the processes are cadlag, however, the null set can be chosen indepen- 
dent of time t, so that these equalities and inequalities hold for all t simultaneously 
with unit probability. We will use this fact below without further comment. 

Proof. We begin by noting that [9l lemma 2.1] 



dP>* 
dP" 



By the Bayes formula, we obtain P M -a.s. 

Introduce the notation 

?u ( d V 



E' 



Qi 



E" 



\ dv 



(X ) 



&7 



E" 



(Z/t 
dv 



(X ) 



<KW{\ Qoo - et \ \&*). 



Then we find, using the fact that £ t is & -measurable 

Qt \W{i t \^Y)-W{^J)\ = \W{{ eoo ~ Qt )^ Y ) 

That this expression converges to zero P M -a.s. is established in lemma 14.31 below. 
But as Qt — > Qoo P"-a.s. by Levy's upward theorem, we conclude the convergence 
l E/i (ft|^i Y ) ~ E "(6|^"f )l as t -»• oo on {co G Ct : Qoo {uj) > 0} G & Y (modulo 
a P M -null set), and the latter set has P M -mcasure one. □ 



The proof of the previous lemma is not yet complete, as we still need to show 
that E^flgoo — Qt\ \&Y) ~ * 0- If we were interested in L 1 convergence rather than 
a.s. convergence, the result is trivially established. Proving a.s. convergence would 
appear to be a matter of applying Hunt's lemma |12l chapter V, theorem 45], whose 
proof is easily adapted to the continuous time setting. Unfortunately, this would 
require g t to be dominated by an integrable random variable, which may not be 
the case (to guarantee that this is the case we could impose, e.g., a finite relative 
entropy condition ||^) < oo, see [HJ chapter V, sec. 25(c)]). Instead, we proceed 
by adapting Rao's proof of Hunt's lemma [35J lemma 2] to our setting. 



Lemma 4.3. E"^ - g t \ \&{) — ^ P v -a.s. 

Proof. Denote \g oa — Qt\ = Ut and E"(|goo — Qt \ \^t~) = Vt, and fix a constant e > 0. 
Define the following stopping times: 

ti = inf{t > : v t > s}, o\ = inf{i > t\ : v t < s/2}, 
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and for any n > 2 

r„ = inf{i > er„_i : v t > e}, a n = ia£{t > r„ : v t < e/2}. 

By right-continuity of the sample paths, v Tn > e on {r„ < 00} . But then 

£P"(r„ < 00) = eP"(« T „ Ay.«» > e) < E"(ti TB / T „ <00 ), 

where we have used Chebyshev's inequality. But as v t is the optional projection of u t 
[T2l chapter VI, theorems 43 and 47], we can write v Tn I Tn<0 o = E"(u Tn I Trl<00 |^^) 
P"-a.s. Hence, in particular, eP v (r n < 00) < W(u Tn I Tn <oo)- 

We now claim that t„ — ► 00 as n — > 00 a.s. To see this, note that t„ is nondc- 
creasing, so it must converge either to infinity or to a finite value. But if it converges 
to a finite value, then that sample path of Y± must have a discontinuity of the sec- 
ond kind and hence cannot be cadlag. Thus we can conclude that r n — > 00 a.s., 
and hence u Tn — > a.s. by Levy's upward theorem (as Qt — > Qoo a.s.). We would 
like to show that u Tn — > in L 1 , so that we can conclude that P 1/ (r„ < 00) — > as 
n — > 00. To this end, note that u Tri —> in L 1 is equivalent to g Tn — > goo in L 1 . But 
applying again the optional projection property, we find that g Tn = E 1/ (p oc |^^) 
P"-a.s. Hence the desired convergence follows from Levy's upward theorem. 

We have established that P i/ (t„ < 00) — * as n — > 00. It follows directly that 
P i/ (tVi < 00 for all n) < mi n V v (T n < 00) = 0, so with unit probability either 
limsupn^oo «t < e, or Uminf n ^ 00 «i > e/2. But note that ||ut||i < ll u t||i ^ as 
t — > 00, so w t — ► in L . Hence liminfn^oo vt > e/2 can only happen on a null 
set, and we conclude that limsup^^^ vt < £ a.s. As this holds for any e > 0, the 
desired convergence is established. □ 

We are finally in a position to prove the main result. 

Theorem 4.4. Let fj, <C v and f E O. Then 

\W{f{X t )\,^) - W{f{X t )\,^)\ -> P"-o.«. 

Proof. First, note that it suffices to prove the theorem for / G O . After all, 
suppose we have established the result for O . By proposition 13.61 there is for 
/ G O a sequence /„ G 0° such that ||/ — / n || — >• as n — > 00. Then 

hmsup |E"(/(X t )|J^) - E"(/(X t )|^ t y )| 

£— »oo 

< hmsup |E"(/(* t ) - /„(A t )|J^)| + hmsup \W{f n {X t ) - f(X t )\^)\ 

t— >oo t — >oo 

+ hmsup \W{f n (X t )\^) - W(f n {X t )\^)\ 

t— >oo 

< 2 ||/- /„]| + hmsup |E^(/„(X t )|^ t y ) - E"(/„(A t )|^ t y )| 

t — >OC 

= 2||/-/„|| P"-a.s. 

But then the result follows for / G O by letting n — ► 00. 

We may thus assume that / G 0°, and by the linearity of the conditional expec- 
tation we may assume without loss of generality that / is of the form 

f(x) = E (ail0 (O, € = - Y )f 2 (Y t2 - Y ) ■ ■ ■ f n (Y tn - Y ), 

for some n < 00, ti £ D and bounded continuous functions By the Markov 
property, we find that f(X t ) = E"(£o0 t |J?- t ) P"-a.s. and that /(X t ) = W{^o9 t \^ t ) 
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P M -a.s., so we obtain 

\W{f{X t )\^) - W{f{X t )\^)\ = \W£ o t \&?) - E"(€ o Bt\&Y)\. 

But as the fi are continuous, £* = £ o # t has cadlag sample paths, and clearly £t is 
J^" y -measurable for every t. It remains to apply lemma |4~T1 □ 

An immediate consequence is that observability implies stability. 

Definition 4.5. A filtering model is stable if whenever ^ <C v, 

\W{f{X t )\.^)-W{f{X t )\.^)\^Q P"-a.s. for all / e C 6 (S). 

Corollary 4.6. If the filtering model is observable, then it is stable. 

Proof. This is immediate from the definition of observability □ 

Remark 4.7. A word should be said at this point about the assumptions that the 
signal process is a Markov process and that the observation process has condition- 
ally independent increments. There is nothing essential in the convergence proofs 
that depends on these properties, and indeed these can safely be dropped (in fact, 
one may then choose the observation state space O to be any locally compact Polish 
space). In this case, however, we could not guarantee that the space of observable 
functions will contain only functions on S; instead, we would obtain C Co(§ x O) 
and M C A4(§ x O), and we would have to consider convergence of conditional 
expectations of the form E(/(AT t , Yt)\^Y )■ ^ n other words, in this case the initial 
measure on the observation process can play a nontrivial role, which is not surpris- 
ing. The setting in which we have chosen to work — where the signal dynamics does 
not depend on the observations and the observation noise is memoryless — is the 
natural setting where the initial measure on the observations decouples from the 
problem. This allows us to concentrate on filtered estimates of the signal process, 
which are the quantities which are of interest in the majority of applications. 

Remark 4.8. Our notion of stability requires that fi <C v. This is unavoidable if we 
wish to define the filtered estimates as conditional expectations: as E M (/(A t )|^ t r ) 
is only defined up to P M -a.s. equivalence, the comparison of E M (/(A t )| J^ t y ) and 
~Ei v (f{Xt)\&Y) for fi <^ v need not make sense under any measure. In many cases, 
however, there is a natural version of the conditional expectations which may be 
defined simultaneously with respect to all P u . In this case, one may ask whether 
the filter is strong stable, i.e., whether stability holds even for fj, <^.v. This typically 
requires a controllability assumption in addition to observability fsection !7.3p . For 
the time being we are chiefly interested in observability, but we will return to the 
strong stability problem in section[7]in the setting of white noise type observations. 

5. White noise type and counting observations 

The purpose of this section is to investigate how two specific observation models 
that are extremely common in practice — white noise and counting observations — fit 
into the general results developed in the previous subsections. 

5.1. White noise type observations. We consider the following setting: X t is 
a Feller-Markov process, and Y t can be written in the form 

Y t = Y + f h{X s )ds + KB t , 
Jo 
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where K is a non-random p x q matrix, h : § — ► O is a continuous function, and 
-ftT-B* is a p-dimensional Wiener process, with covariance matrix KK* , which is 
independent of X t and lo (for any P v ). Note that X may be degenerate, in which 
case v*KB t could be identically zero for certain v € W. 

Lemma 5.1. The white noise type observation model satisfies the conditionally 
independent increments property. 

Proof. Let £ be any bounded, ^ x V Sf^-measurablc random variable. Then £ 
is & x V a{KB t : t > 0}-measurable. We claim that for any & x V a{KB t : 
t > 0}-measurable random variable, E( X y)(£) is independent of y. To establish 
this, it suffices to prove the claim for functions of the form £1^2 where £1 is im- 
measurable and £2 is a{KB t : t > 0}-measurable; the statement then follows by the 
monotone class theorem. But 'E( x ,y)(£i€2) = ~E{x,y){£,i)~Ei{x,y){£,2) by independence, 
while E( Ki!/ )(£i) only depends on x (as X t is a Markov process) and E( a , i j / )(£ 2 ) 
depends on neither x or y (as B t is a Wiener process for every P„). □ 

As the observations only depend on the signal through the observation function 
h, a natural question is whether the observable and nonobservable spaces depend 
on the noise covariance KK*. As one might expect, this is not the case; for the 
purpose of observability, we may simply take K = 0. This is very convenient in 
computations, and shows that observability is a structural property which does not 
depend on the signal-to-noise ratio of the observations. 

Proposition 5.2. For the white noise type observation model, 

Af = {am - afx 2 eM(S) : a e R, fix, 1*2 eP(S), ~P^\&h = P fl2 k4, 

where & h = a{h(X t ) :t>0}. 

To prove this statement, we will need the following simple lemma. 

Lemma 5.3. Let (Zx, . . . , Z n ) and (Z[, . . . , Z' n ) be arbitrary random variables, and 
let (£1, . . . ,£„) be Gaussian random variables independent of all Z\, Z[. Then 

(Z l: Z n ) ^ (Z[, . . . , Z'J tff (Zx+tx, • • • , Z n + £„) ^ (Zi+fr,..., Z' n +i n ). 

Proof. Recall that a probability measure on R™ is uniquely determined by its char- 
acteristic function. Denote by Xz, Xz 1 , Xz+£, Xz'+£, and X(, the characteristic 
functions of {Zx, . . . , Z n ), (Z[,..., Z' n ), (Zi+fc, Z n +£n), {Z'xHi, Z' n Hn), 
and (£1, . . . ,£„), respectively. Then, by independence, xz+e, = XzXi an d Xz'+e, = 
Xz'Xi- But as £ is a Gaussian random vector, Xi i s invertiblc, so evidently 
Xz+i = Xz'+(, iff Xz = Xz 1 ■ This establishes the claim. □ 

Proof of proposition \5.S\ Recall that fi G M iff there exist probability measures 
Mi)A*2 S P(S) and a > such that [i — a/j,x — 0LH2 and P Ml |^-y = P^ 2 |jrv. 
By theorem 16.6] the finite dimensional distributions form a separating class 
for probability measures on D([0, oo[; Q), so /j 6 JV iff P^ls? = P^ 2 !?? f° r all 
Sf = (j{y tl , . . . , Yf n } with n < 00 and tx, ■■ ■ ,t n £ [0, 00 [. But by lemma [573] this 
is the case iff P^ 1 \& = \ x for all = h(X s ) da,..., f Q n h{X s ) ds} with 

n < 00 and tx, ■ ■ ■ ,t n £ [0, 00 [. As J Q h(X s ) ds is continuous (and in particular 
cadlag) , so that the finite dimensional distributions form a separating class also for 
this process, and as a {J* h(X s ) ds:t>0} = <$ h , the result follows. □ 
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5.2. Counting observations. We now turn to the case of counting observations, 
for which almost identical results hold. In this setting, Xt is again a Feller-Markov 
process, and Y t = Yq + N t where N t is a Cox process [THl proposition 10.5] with 
intensity At = h(X t ), conditionally independent of Yq given ^ x , for every P M . 
By definition, this implies that for any fj,, under a regular conditional probability 
P/i(- \^ x ) (which exists as our spaces are Polish), A^ are independent Poisson 
processes with intensities \\ and the process Nt is independent of Yq. Here the 
observation function h : S — > O is a continuous nonnegativc function. 

Lemma 5.4. The counting observation model satisfies the conditionally indepen- 
dent increments property. 

Proof. Let £ be any bounded, ^ x -measurable random variable. Then by our 
assumptions, under a regular conditional probability P( KiJ/ \( • \^ x ), the law of £ 
only depends on the sample paths of X t and is thus independent of y. In partic- 
ular, this means that E( XJ( )(£| j?" x ) does not depend on y. But then E( XJ/ )(£) = 
E/ a!iI ,)(E( a . iI/ )(£|«^'- x ')) can not depend on y, as X t is a Markov process in its own 
right and as f&r xy \{fe\&') is an & x -measurable random variable. □ 

An analog of proposition 1 5 . 21 also holds. 

Proposition 5.5. The conclusion of wrovosition [5JH\ holds identically for the count- 
ing observation model. 

Proof. This follows directly from [19l lemma 10.8]. □ 

5.3. A simple sufficient condition. Let us mention a useful consequence of these 
results, which leads to a particularly simple sufficient condition for observability. 

Lemma 5.6. For the white noise type and counting observations models, it is 
always the case that f o h 6 O for any measurable function f : O — ► M such that 
f oh 6 Cb(S). In particular, if h is one-to-one then we may conclude that O = C(,(S) 
(i.e., the signal-observation model is observable). 

Proof. Let / o h £ Ct(S), and choose any [i e M . Then /i = ct[i\ — a/j,2, where 
= P^ 2 |^. Thus in particular P^\ a{h{Xo)} = P^\ a{h{Xo)} . But f(h(X )) 
is cr{/i(A"o)}-measurable, so that evidently 

J f(h(x))n 1 (dx) = J f(h(x))n 2 (dx). 

As this holds for any [i € A/", we find that / o h € O. □ 

In other words, "nice" functions of the observation function are always observ- 
able, regardless of any further properties of the model. 

Remark 5.7. For the special case where / is chosen to be the identity, the stability 
of the observation function (in a slightly different sense) was found in [21 theorem 
3.1] under much weaker conditions. However, the latter result cannot be used to 
conclude the stability of the filter, even in the case when h is one-to-one. 
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6. Finite state signals 

The simplest nonlinear filtering model is one where the signal state space consists 
of a finite number of points. Such models are of particular theoretical and practical 
interest as the filtered estimates can be finite dimensionally computed. On the 
other hand, this model shares many of the features of more general models and 
thus serves as a convenient prototype. In this section, we will use the results of the 
previous section to obtain an essentially complete characterization of the stability 
of such filters in the case of nondegenerate white noise type observations. 

Throughout this section, X t is a Markov process on the finite state space § = 
{ai, . . . ,ad} with transition intensities matrix A = {\j)\<i.j<d- The observations 
process Y t is taken to be one-dimensional and of the form 

Y t =Y + f h(X s )ds + KW t , r = o, 
Jo 

where h : S — ► R. The restriction to one dimension is for notational convenience 
only; all the results extend directly to observations in M. p . 

Remark 6.1. Yq is included as a reminder that this is a Markov observation model. 
As usual, we will enforce Yq = by working with the measures P M . 

In subsection 16. 1\ we elaborate on the structure of the spaces TV and O in this 
setting (the results of this section hold identically for the case of counting observa- 
tions) . Subsection 16.21 (see also section I7.2j) is devoted to the complete characteri- 
zation of the stability of the filter. Here we make essential use of the white noise 
type observations, and it is moreover crucial that the observations are assumed to 
be nondegenerate n > 0. The reason for this is that we will invoke results that hold 
only in this setting; see remark 16.131 below for further details. 

Finally, a word on notation. We use the following notation for the filter: 7rf (/) = 
E /i (/(X t )|^" t Y ). When 7r^ is used as a vector, this is implied in the sense that 
(~KtY = ^ti^iai})- We will interchangeably treat functions on S as vectors in R d 
in the obvious way (v % = w(aj)), whenever this is convenient. The transpose of a 
vector or matrix is denoted as v* or M* . 

6.1. Observability. For the particular case of a finite state signal and k = 0, the 
notion of observability has been investigated in the context of identifiability and 
lumpability of hidden Markov models (TTl [23j [15] , though chiefly in discrete time. 
In this subsection, we briefly develop the necessary results in our setting. 

Let fj,, v € 'P(S). To determine the nonobservable space Af, by proposition l5.2[ we 
need to find all /j,, v with P^l^/i = V v \c$k . But as h(X t ) is a cadlag process, it suffices 
to verify that the finite dimensional distributions of h(X t ) are the same under P M 
and P" theorem 16.6]. We thus begin by computing these distributions. 

Lemma 6.2. Let H = /i(§) = . . . , b r }, r < d be the set of possible observation 
values. Define the d x d projection matrices Hb k such that (Hi, k )i.j — 1 whenever 
i = j and h(a,i) = bk, and zero otherwise. Then under P M , the finite dimensional 
distributions of h(X t ) have the form 

P"(fc(X ) = «o, h(X tl ) = ni, . . . , h(X tk ) = n k ) = 
where n, € H and 1 € M. d is the vector of ones. 
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Proof. The result follows from 



P M (^o = m ,X tl = m 1 , . . . ,X tk = m k ) = 
by summing rrij over the set {a^ G § : h(a,i) 




mi 7712 




m k - 1 m k ■ 



□ 



We immediately conclude the following. 
Corollary 6.3. The observable and nonobservable spaces satisfy 

O = span {H no e ASl H ni e A5 *H n2 ■ ■ ■ e M » H nk l : k > 0, 5 t > 0, n, £ i} , 
Af = ± = {v eR d : v*x = for all x G O}. 
The model is observable if and only if dim O = d. 

The following simplification is useful in computations. 
Lemma 6.4. The observable space can be characterized as follows: 
O = span {H na KH ni K- ■ ■ AH nk 1 : k > 0, m G H} . 
Proof. Note that any vector of the form [p% < d — 1) 

H na k^H ni k^H n . 2 ---V»<H nk l 
can be obtained from a vector of the form 



by taking derivatives with respect to Si, and in particular the former is the limit 
of elements of O. But O is closed as it is a finite dimensional linear space, so the 
span of the former is contained in O. To prove the converse inclusion, it suffices 
to expand the matrix exponential in a power series and apply the Cayley-Hamilton 
theorem. Finally, note that sum to the identity matrix, so we can reduce to 
the case where pi = 1 for all i. □ 

We will need, in particular, the following important consequence. 

Corollary 6.5. O is invariant under A and H^: AO C O, H^O C O. Similarly 
Af is invariant under A* and : A*Af C N ' , H^Af C TV. 

Proof. Immediate from the previous lemma. □ 

Remark 6.6. Given the previous corollary, it is not surprising that O can in fact be 
characterized by its invariance property. To this end, denote by 



Then O is the smallest subspace of M. d that contains Oh and is invariant under A 
and all i?^.. Indeed, let us call this smallest subspace O' . Clearly O' C O, as O 
contains Oh and is invariant under A and . On the other hand, every element 
of O can be generated from elements in Oh by a finite number of multiplications 
by A and H\ >i and linear combinations. Hence O C O' . 

To verify observability, we could proceed as follows. Denote 

Z\ = O h , Z n = Z n -i + AZ„_i + H^Zn-i H YB. br Z n -\, n > 1, 

where the sum of two linear spaces denotes their linear span. It is evident that 
every element of O will be in 2 n for som.6 n. Moreover, if Z n — Z n -^.\ for some 



1 TT Ad 2 TT 




O h = {/ o h : V/ : H -► R} = span{ff fci l : b { G H}. 
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n = m, then it is true for all n > m, and in particular Z m = O. Finally, we claim 
that this will always be the case for some m < d. Indeed, the dimension of Z n can 
not shrink with increasing n, but it can not grow larger than d as we are working 
in R d . As Oh contains at least the constants, the procedure must complete in at 
most d — 1 steps. This idea is classical, see, e.g., [2 section 3.2.2], and could be 
implemented, e.g., by starting with the natural basis {H^l} of Oh and applying 
the Gram-Schmidt procedure at every iteration n to obtain a basis for Z n . 

Remark 6.7. In an early paper on filter stability, Dclyon and Zcitouni |13j impose 
a condition (A2) which, by the previous remark, is seen to be sufficient (but not 
necessary) for observability. In addition, they assume ergodicity of the signal pro- 
cess. Though their condition (A2) was later shown to be superfluous [H theorem 
4.1] in the nondegenerate case k > 0, Dclyon and Zeitouni show through a coun- 
terexample that when their condition (A2) is not satisfied, the filter may lose its 
stability as k — > 0. That this can not happen when condition (A2) is satisfied is to 
be expected as, by corollary 14. 61 above, observability implies filter stability without 
any nondcgcncracy or ergodicity assumptions. It does not appear, however, that 
our results can be related to the methods used in |13| . nor do our results give any 
information on the rate of convergence (exponential convergence is proved in |13|). 

Remark 6.8. Denote by C — [H^X, . . . , Hb r l] the d x r matrix whose columns are 
indicator functions on level sets of h. A sufficient (but not necessary) condition 
for observability is that rank([C AC A 2 C • ■ ■ A d_1 C]) = d, which is the classical 
observability test for linear systems. This corresponds to considering only the one 
dimensional distributions of h{X t ) 1 rather than all finite dimensional distributions. 

6.2. A complete characterization of filter stability when n > 0. Corollary 
14.61 and the results of the previous section show that 

Corollary 6.9. 7/dimO = d, then the filter is stable. 

The converse, however, is not true. In the nondegenerate case k > 0, it was 
shown by Baxcndale, Chigansky and Liptser 0] that ergodicity of the signal is a 
sufficient condition for stability of the filter, regardless of the observation structure. 
It is not difficult to find an example of a filtering model that is not observable, 
but has an ergodic signal (e.g., choose any ergodic signal and set h = 0; another 
example is the one in [13]). 

The goal of this section is to find a necessary and sufficient condition for filter 
stability in the nondegenerate case k > 0, which we will assume throughout. This 
is done by combining our results above with the results from [4]. To gain some 
intuition, recall that \n£(f) - 7if (/)| -> 0, or, equivalently, \(ir?)*f - «)*/| -> 0, 
for any / € O. This implies that as t — > oo, the signed measure 7rf — converges 
to the nonobservable space J\f. To ensure stability, we would like to find a condition 
under which the space J\f converges to zero under the dynamics of the filter. 

One plausible condition is to require that the signal itself "forgets" perturbations 
in N, i.e., that \P^(X t = a % ) - P v (X t = a,)| -> as t -> oo for all a* G § 
whenever fi — v G M . In this way, we obtain the natural counterpart of the notion 
of detectability in linear systems theory. We will show that this condition is indeed 
necessary and sufficient for stability of the filter, provided that k > 0. 

Before turning to the proof, let us make precise what we are going to show. 
Recall that A* Af C N\ hence it makes sense to speak of the restriction A*|jv. 
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Lemma 6.10. Denote (pf = P M (At = en), and suppose that we have dimTV > 0. 
Then the following are equivalent statements. 

(1) \p¥ — p v t \ — > as t — > oo whenever \i — v G M . 

(2) A*!^ is Hurwitz (its eigenvalues have strictly negative real parts). 

(3) A* has full rank. 

Here \v\ denotes the l\-norm of the vector v. 

Proof. The Kolmogorov forward equation states that 



Pa - Po = A* - v G TV, 

where the second equality follows immediately from the fact that this equation 
leaves TV invariant. It is well known from linear systems theory that the solution of 
this equation decays to zero as t — » oo for every initial condition fi — v € jV if and 
only if A*|jv is Hurwitz. The fact that A*|jv is Hurwitz if and only if it is of full 
rank follows from the fact that any nonzero eigenvalue of A* has strictly negative 
real part [H pages 52-53] . □ 

Our previous discussion now motivates the following definition. 

Definition 6.11. The signal-observation model is called detectable if it is cither 
observable or any of the equivalent conditions of lemma [6. 101 hold. 

The goal of this section is to prove the following theorem. 

Theorem 6.12. Suppose that k > 0. Then 7rf — 7r^| t ^°°> P^-a.s. whenever 
/i<Kif and only if the signal-observation model is detectable. 

Remark 6.13. The situation for k — appears to be more complicated, and the 
theorem does not hold in this case. A counterexample can be found in [U section 
3] (see also [13]), which discusses a model that is certainly detectable, but the filter 
is not stable when n = due to a sort of "geometric obstruction" . Problems of this 
sort, in somewhat different setting, date back to the work of Kaijser [18], and some 
recent progress on that problem can be found in [20] . A complete understanding of 
this case is still lacking, however. 

6.2.1. Necessity. To prove theorem 16. 121 we begin by showing that detectability is 
a necessary condition for the stability of the filter. 

Lemma 6.14. Suppose that |7rf — tt"\ *^°°> P M -a.s. for any fi C c. Then the 
signal- observation model is detectable. 

Proof. Let - v G yV; then and V v are identical on . As is - 

measurable, E^«(/)) = E"«(/)). Thus 

<(/)| > |E"(7tf (/) - <(/))| = \W{f{X t ]) W(f(X t ))\ for any /. 

Now suppose the model is not detectable, i.e., there exists v G N so that \pt~ p"\ /> 
when \i — v oc v. Choose v = (v + + v~)/2v + (§) and fj, = v + /v + (S); then 



^ U A * u u 

Jt p * =Ap t' P0 = 

Hence, in particular, 

i(Pt ~Pt) = A*(rf-K) = A *W(Pt ~Pt)i 
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fj,, v E "P(S), \x ^ v and /i — oc v. As pf — p" does not converge to zero, there 
must exist a function / G C&(§) and a sequence of times t n /* oo such that 



Remark 6.15. The previous proof does not use at all the fact that § is a finite set 
or that the observations are of the white noise type. Indeed, let us call a general 
model detectable tf/i-veAf implies that \E^(f (X t )) - W(f(X t ))\ -> as t -> oo 
for any / G C&(S). Then precisely the same proof shows that detectability is a 
necessary condition for the stability of the filter (it is not even necessary to assume 
that § is compact). The difficult part is to establish that detectability is a sufficient 
condition for stability of the filter, and this is what we will do below for finite state 
signals with nondegenerate white noise type observations. 

6.2.2. Sufficiency: no transient states. We now proceed to prove that detectability 
is also a sufficient condition for stability when k > 0. Throughout this and the fol- 
lowing subsection we always assume that the signal-observation model is detectable. 

In the proofs, we make use of the partition of the state space § into M < oo 
ergodic classes Uj, i = 1, . . . , M and a transient class T, so that § is the disjoint 
union of these sets. Any Markov chain can be uniquely decomposed in this way. 

Lemma 6.16. If AF = for a function F, then F G O. 

Proof. AF = is equivalent to e At F = F for all t > 0. In particular, this implies 
that {pt)*F = fi*F for all t > 0. Now suppose that F <£ O; then there exist 
fi, v such that \i — v G Af and \\i*F — v* F\ = a > 0. In particular, we find that 
\{pt)*F - {p v t )*F\ = a for all t > 0. But detectability implies that |p£ -p v t \ -> as 
t — > oo for fi — v G Af, so we have a contradiction. □ 

Corollary 6.17. Suppose T = 0. Then I Vi G O for any i = 1, . . . , M. 

Proof. It is easily seen that AIu i = when there are no transient states. Hence the 
statement follows from the previous lemma. □ 

Suppose that T = 0. The essential consequence of dctcctablity is that as t — > oo, 
we will be able to determine precisely in which of the ergodic classes Ui , . . . , Um 
the signal started at t = 0. Following the logic of [4], this will cause the filter to be 
stable when combined with the fact that the filter is stable for ergodic signals. We 
will deal with the transient states separately in the next subsection, and assume 
for now that there are no such states (or, equivalently, that we work with initial 
densities that are supported on the ergodic classes only). 

Lemma 6.18. E !y (/u i (Xo)|^ t y ) t ~~ > °°> I Vi (Xo) P u -a.s., provided that there are no 
transient states T = 0. 

Proof. For any j such that v{\]j) > 0, denote by i/j = I^v/vfJJj). Then Uj <C v 
and 7r^ (iui ) = But by theorem l4~4l 7r^ J (JyJ — tt^/uJ — > as t — > oo P^-a.s., 
as Ivi G O. In other words, W (n" (iu J 7^ 5 l} and X n G Uj) = 0, so 7if (iuj — > 5y 
on {w : Xo G Uj}, modulo a P^-null set. Finally, note that Iy^Xt) = Ih^Xq) 
1? _ a.s., as the ergodic classes do not communicate. □ 



\W{f{X tn ))-W{f{X tn ))\ 



> a > 0. 




> 0. 



□ 
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We can now prove sufficiency for the special case T = 0. 

Lemma 6.19. Suppose T = and /x <C v. Then |?if (/) - <(/)| ->■ V^-a.s. for 
allfeC b {S). 

Proof. By the Bayes formula, we find that P M -a.s. 

M 

W{f{X t )\&Y) = ^E^(/(X t )|^ t y )P^(X g U,-|^f ). 

3=1 

The same equation holds with /x, /Xj replaced by u, Vj. The result now follows easily 
from the previous lemma and the fact that |E^ (f(X t )\&?)-Wi (f(X t )\^)\ -> 
by [H theorem 4.1] (as X t is supported entirely in the ergodic class Uj under the 
initial measures \ij ,Vj)- Q 

6.2.3. Sufficiency: general case. We now consider the general case with T =/= 0. Let 
us begin by showing that the transient states themselves decay as t —> oo. 

Lemma 6.20. nf(I T ) -> as t -> oo P"-a.s. 

Proof. Note that If(X t ) — > P^-a.s., as the transient states must decay eventually 
into one of the ergodic classes. Now write 



■tf(h) = ~Ej v (l T (X t )\&{) < E" sup I T (X U ) 



for all t > n P"-a.s. (using the cadlag paths to eliminate the time dependence of 
the null set). Hence 



limsupE 1 ' (l T (X t )\ < E" ( swpI T (X U/ 



The claim follows by letting n — ► oo using dominated convergence. □ 

Evidently, as t —> oo, the conditional measures ~k\ and 7rf converge to measures 
that are supported on the ergodic classes U = S\T = ljf=i On the other hand, 
if we start with fi -C v which are already supported on U, then — -k^\ — > by 
lemma [6.191 This strongly suggests that we should have 1 7r^ — 7r^ | — s> for any 
fx <C v. Our goal is to prove this assertion. 

Lemma 6.21. Suppose that /x <C v and that fj. is supported on U. Then 

W (lim S up|7rf(/)-<(/)|) <osc(/) ||dxxAMIoc v(T), 
where osc(/) = max(/) — min(/). 

Proof. Let us write v\] = Inv/vQU) and vj = Ijv/v(T). By the Bayes formula, we 
find that P"-a.s. 

<(/) = <"(/) P"(X e U|J*f) + < T (/) P"(*o e T|J^). 

It follows directly that P Mj -a.s. 

kr(/) - <(/)i = kr(/) - < T (/)i p v (*o e T|^ f y ), 

so that in particular 

hmsup |7!f (/) - <(/)| < osc(/) P"(X € T\,^ Y ). 
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We thus compute 

E^(P"(Xo G T|^ Y )) < W/dvWvo W(V V (X G T\,? Y )) = |Mm/^||oo v{T). 
The claim now follows from lemma 16.191 and 

hmsup K(f) - <(/)! < hmsup \tt? (/) - + limsup - <(/)|, 

t — >co t — >oc t — >oo 

using the fact that \i and v%] are both supported on U. □ 

To establish that the right-hand side in the expression in this lemma can be 
chosen to be zero, we will use the Markov property of the filter. 

Lemma 6.22. For /i <C v, the "pair (tt^tt^) is a Feller-Markov process under P' J . 

Proof. Recall that dB^ = K~ 1 (dY t — 7rf (h) dt), the innovations process, is an J? t y - 
Wiener process under P M , and that we thus have 

cfaf = A*7if dt + kT 1 (H - h*ir? ) ?if dB£, 

d< = A*< dt + k- 1 (H - h*irZ) < (dB? + kT 1 h*i$ dt - kT 1 dt), 

where H = diag(ft) and (7Tq,7To) = For these facts, see, e.g., [25]. Being 

the solution of a stochastic differential equation with Lipschitz coefficients (the 
coefficients arc bounded in the double simplex A d x A d , and the first exit time 
from the simplex is infinite), it is well known that there is a unique strong solution 
which satisfies the Markov and Feller properties. □ 

A particular consequence of this lemma is the following. Consider the pair of 
A d - valued stochastic differential equations 

d-K t = A*n t dt + k,- 1 (H - h*TT t )ir t dW t , 

dn t = A*7f f dt + kT 1 (H — ft* 7ft) 7ft {dWt + k~ h*ir t dt — n^ 1 h*Tf t dt), 

where Wt is a standard Wiener process. The solutions of this stochastic differen- 
tial equation can be realized on the canonical path space Cl = D([0, oof; A d ) x 
D([0, oof; A d ) such that ir t (u,v) = u(t) and Tr t (u,v) = v(t) are the canonical 
processes, and with a family of measures 1P(^,i/) under which (7r t ,7f t ) solve the 
stochastic differential equation above for the initial condition (7i"o,7fo) = (/x, v). 
We can subsequently introduce the natural filtration S t = a{(ir s ,jT s ) : s < t}, 
augmented as usual with respect to the family P(^.„), and the canonical shift 
9t(u,v)(s) = (u(s + t),v(s + i)), such that the process (7r t ,7f t ) satisfies the usual 
Markov property with respect to the filtration <§ t and the family P( Mj „). From the 
proof of the previous lemma, it follows that for any fi <C v, the law of the process 
(7Tt, 7ft) under P( Mll /) coincides with the law of the process (7rf, 7r^) under P M . In par- 
ticular, our previous results can be applied to the process (7r t ,7f t ), and to establish 
stability it suffices to demonstrate the corresponding property for the latter. 

Remark 6.23. The construction of (7Tt,7ft) on its own path space is certainly not 
necessary, but helps alleviate some notational confusion. In particular, we will be 
using the Markov property of the filter, whereas our previous notation is geared at 
the Markov property of the signal-observation pair. 

Combined with lemma [6.2 li we can now establish the following. 

Lemma 6.24. Suppose that \i <C f and that fi is supported on U. Then it follows 
that K(/) - ttH/)! -» P"-o.a. for any f G C b (S). 
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Proof. Using the Markov property, we can write 



E, 



Km sup \n(f) - n(f)\ 



E 



(w.) 



limsup \n(f) - n(f)\ 



where we have used the fact that the random variable limsup^^ \irt{f) — ^t(f)\ 
is invariant under the shift 6 S . By [jJJ lemma 2.1] we find that ir s <C tt s P/^s-a.s. 
whenever fj, <C v, whereas clearly tt s is P(„ „)-a.s. supported on U whenever fi is 
supported on U. Hence we can invoke lemma 16.211 and we find that 



E 



where ||e?7T s /d7r s 



limsup |7T t (/) - 7T t (/)| 



S s < 0SC(/) 1 1 d-Ks/d-Ks | |oo TTs(h), 



maxi^dir s I ' d-K s )' 1 . In particular, this implies that 



EM limsup |tt? (/)-<(/) | 



^ <0 SC (/) ||rf</d<|U <(/ T ). 



Now note that (see, e.g., [HI lemma 2.1] 



E^pr )|jr y ,X a = ffli ) 1 
max ^ — j — 33 — < — 



dfi 
dv 



Hence, letting s — > oo and using lcmma [6.20[ we find that 

limsup - =0 on {to : eoo (w) > 0}\JV, 

£— >-oo 

where P^(iV) = 0. But P^ig^ > 0) = 1, so the result follows. 



□ 



We can now finally complete the proof. It is important to remember that we 
have assumed detectability throughout this subsection. 

Proposition 6.25. Suppose the signal-observation model is detectable. If /i <C v, 
then [Trf (/) - <(/)| P/*-o.s. /or any / G C 6 (S). 

Proof. By the previous lemma, we find that |7rf u (/) — 7rf (/)| — ■> and |7rJ*(/) — 
<(/)! P Mu -a.s. Hence, using the triangle inequality, |?if (/) - <(/)| -> 
P^o-a.s. But this implies, as in the proof of lemma 157151 that |7if (/) - <(/)! -» 
on {a-> : 6 U}, modulo a P M -null set. In particular, we can then estimate 



E"( limsup !<(/)-<(/)! 

t— >oo 



E" limsup \^{f) - <(/)! < osc(/) /i(T). 

\ t— >oo / 

To proceed, we apply the Markov property as in the previous proof. This yields 

^ < osc(/) <(/ T ). 
The result follows by letting s — > oo and using lemma [6.201 □ 

7. Strong stability for nondegenerate white noise type observations 

7.1. Strong stability. Up to this point, we have always assumed that the initial 
measures of interest are absolutely continuous /i<i/. In this section we consider 
the case when jj, v. As explained in remark 14.81 the filter stability problem is 
in general not even well defined for such initial measures, and the characteriza- 
tion of strong stability (the stability of the filter for arbitrary initial conditions) 
requires choosing a particular version of the conditional expectations. In the case 
of nondegenerate white noise type observations, however, there is a natural choice 
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of version, viz. the one provided by the Kallianpur-Striebel formula, whose con- 
struction we now briefly recall (see, e.g., [25j section 7.9]). 
We consider the generic white noise type observation model 



Y t =Y + f h(X s )ds + KB t 
Jo 



of section 15.11 Beside the assumptions of section 15.11 however, we additionally 
assume nondegeneracy of the observation process, i.e., we assume the K is an 
invertible matrix. The importance of this requirement stems from the fact that it 
allows us to use Girsanov's theorem to remove the dependence of the observations 
on the signal process, a fact which will be exploited shortly. 

Denote by Q M the measure on the space of signal sample paths Q x such that 
the canonical process Xt(x) = x(t) has the law of the signal process with initial 
distribution /i, i.e., Q' 1 is the marginal of P M on fl x . Moreover, we denote by W 
the Wiener measure on H Y with covariance KK* (i.e., Y t (y) = y(t) is a Wiener 
process with covariance KK* under W). Then, by Girsanov's theorem, 

* = rf(Q t P xw)u = cxp ( jW)- 1 **-) "*--\ f u^rar ds 

Thus, using the Bayes formula, we obtain the following characterization of the filter: 
<(/)(,,,) = W {f (X tW l ){Xl y) = k^0^m^ p,_ a . s . 

This is the Kallianpur-Striebel formula. Note, however, that the expression on 
the right hand side depends only on the observation sample paths £l Y in the time 
interval [0,t], and is well defined not only P M -a.s. but in fact for W-a.e. y G f2 Y . 
Moreover, it is easily seen from Girsanov's theorem that the observation marginals 
satisfy P M |^v- ~ W|^ for any T < oo, regardless of the initial measure [i. Hence 
the Kallianpur-Striebel formula defines a version of tt^ which is P"-a.s. uniquely 
defined for any v (even when \i and v are not absolutely continuous). In the 
remainder of this section, tt^ will always imply this particular version. 

Having now chosen a version of the filter that is well defined under any measure 
P M , strong stability can be meaningfully defined. 

Definition 7.1. The filtering model is strong stable if for any fj,, v, 7 G V(S>), 

\<(f)~<(f)\ ± =^0 PT-a.s. for all / G C 6 (S). 

7.2. A complete characterization in the finite state case. In the finite state 
setting, the condition /1 <C v is not really restrictive in practice. Indeed, if we 
wish to ensure that the filter is asymptotically insensitive to its initial condition, 
it suffices to initialize the filter with a (possibly incorrect) initial distribution v 
which charges every point in the state space, e.g., the uniform distribution on S. 
As any measure on S is absolutely continuous with respect to such a measure, the 
convergence of the thus initialized filter to the optimal one is ensured, regardless of 
the true initial distribution [i, provided that the model is detectable and n > 0. 

Nonetheless the strong stability property is of interest, and can be character- 
ized completely as we did for the stability property. Somewhat surprisingly, the 
observation structure no longer plays a role in this setting. 
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Theorem 7.2. Suppose that k > 0. Then \tt^ — n"\ — — — > P 7 -a.s. for any fi, v, 7 

if and only if the signal process has only one ergodic class. 

Before proceding with the proof of the theorem, let us make the following im- 
portant remark. 

Remark 7.3. The stochastic differential equations used in the proof of lemma 16.221 
can be obtained directly from the Kallianpur-Stricbcl formula, and hence define 
the version of (7rf , ir") which we use in this section. In particular, this implies that 
the arguments based on the Markov property of (itf ,-k") continue to hold in the 
current setting and the condition fx -C v of lemma [6.221 is no longer required. 



Let us first prove the necessity part of the theorem. We assume throughout that 



k > and that 7r^, ir" are chosen to be the Kallianpur-Stricbcl versions 



Lemma 7.4. Suppose that \tt^ — ~k"\ t ~ > °°> P 7 -a.s. for any fx, Then the 

signal process has only one ergodic class. 

Proof. Suppose that the signal process has two ergodic classes Ui and U2 . Choose 
fi to be any distribution that is supported on Ui , and v to be any distribution that 
is supported on ILV Then it is easily verified that ^'(T^) = 1 W-a.s. for all times 
t, while 71-^(7^) = W-a.s. for all times t. Hence we have a contradiction. □ 

We now proceed to prove sufficiency. First, note that it suffices to prove that 
7rf — TTf\ — » P M -a.s. for any /i, v. Indeed, it then follows that 

K - <| < K - n]\ + |< -n]\^^0 P 7 -a.s. 

for any /1, v, 7 by the triangle inequality. As before, it is easier to first consider the 
case with no transient states T = 0. 

Lemma 7.5. Suppose the signal process is ergodic (in particular T — 0). Then 
\iTf — ttJ'I ^0 P^-a.s. for any fi, v. 

Proof. This is precisely j4j theorem 4.1]. □ 

Moreover, we need the following lemma. 

Lemma 7.6. Suppose the signal process has only one ergodic class U. Then 
7Tj (a,) > a.s. for all a,j G U and t > ; regardless of [X. 

Proof. By [231 eq- (7.205)], we have 7if (a,) > a.s. if and only if V»{X t = a,) > 0. 
But in the absence of multiple ergodic classes, it must be the case that P ti (X t = 
ai) > for any O; g D as soon as t > 0, regardless of ll. □ 

The transient states can now be eliminated precisely as in proposition I6.25[ 
completing the proof. 

Lemma 7.7. Suppose the signal process has only one ergodic class. Then we have 
7rf — nt\ ~ > P^-a.s. for any fx,v. 

Proof. First, note that as in the proof of lemma [6.241 we can write 

E( M ,„) ( limsup \ir t (f) - n t (f)\ S s ) = E(7r s . Ss ) ( km sup M/) - 7f t (/)| 

\ t — >OG / \ t — >OC 

where we have used the Markov property and the fact that the random variable 
limsup^^ \^t{f) — 7Tt(/)| is invariant under the shift 9 S . But by the previous 
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lemma, we find that (irt)v ~ (nt)u for any t > 0. Thus we may assume, without 
loss of generality, that /xu ~ iaj in the following. 

By lemma we find that \ir^(f) - 7rf(/)| -> P^-a.s., and also that 

Wt°(f) " ^(Z) ~^ P^-a.s. But from the assumption /iy ~ i/u, it follows that 
PW - P"". Hence, using the triangle inequality, |7if (/) - 7if (/)| -> P w -a.s. 
But this implies, as in the proof of lemma l6.18[ that \-K^{f) — ^t(f)\ ^ on 
{us : Xq e U}, modulo a P^-null set. In particular, we can then estimate 



7.3. A general criterion. In this section, we will give a sufficient condition for 
strong stability for a general signal process (with nondegenerate white noise obser- 
vations as above). We will need the following definitions. 

Definition 7.8. Let Rt be a Markov semigroup on a Polish space with associated 
transition probabilities pt{x, A). Then R t is called 

• strong Feller if R t f is continuous for every bounded measurable /; 

• irreducible if for any nonempty open set A, we have pt{x, A) > for all x] 

• regular if Pt(x, •) ~ Pt(y, •) for every x, y and t > 0. 

A well known result of Has'minskh [16] states the following. 

Lemma 7.9. Any irreducible strong Feller semigroup is regular. 

The following theorem and its immediate corollary are the main results of this 
section. Recall that, by assumption, the signal is a Markov process in its own right. 

Theorem 7.10. If the signal process is regular, then stability implies strong sta- 
bility. 

Corollary 7.11. Regularity of the signal and observability imply strong stability of 
the filter. In particular, the filter is strong stable if the signal-observation model is 
observable and the signal process is irreducible and strong Feller. 

For the proof of the theorem, we need the following counterpart of lemma 16.221 

Lemma 7.12. The pair (tt^tt^) is a Feller-Markov process under P^ 1 . 

Proof. This follows as in the proof of |211 theorem 2.3], The details are omitted. □ 

This implies that as in the finite state case, we can construct the filter on its 
canonical path space f2 = D([0, oof; V(S)) x D([0, oo[; 7- > (S)) (here V(S) is endowed 
with the topology of weak convergence, which turns it into a compact Polish space). 
To be precise, denote by Pr^y) the probability measure on f2 under which the 
canonical processes TXt{u, v) = u(t) and 7ft (it, v) = v(t) have the same law as do tt^ 
and -k\ under P M . As before we introduce the natural filtration S t = cr{(ir s , tt s ) : s < 
<}, augmented with respect to the family P( /J . y ), and the canonical shift #t(u, v)(s) = 
(u(s+t), v(s+t)). It then follows that the process (nt, 7ft) satisfies the usual Markov 
property with respect to the filtration St and the family P^,^- 




To proceed, we apply the Markov property. This yields 




The result follows by letting s — > oo and using lemma [6. 201 



□ 
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The strategy for proving theorem 17.101 is now straightforward. What we will 
show is that if the signal process is regular, then 7rf ~ 7Tj a.s. for any t > 0, 
regardless of ijl and v. Using the Markov property of the filter, the strong stability 
problem then reduces to the ordinary stability problem. 

Proof of theorem \7.1C\ Denote by the law of X t under P M . From regularity, it 
follows that Qj 1 ~ for any /z, v and t > 0. But from the Kallianpur-Striebel 
formula, it follows directly that ~ a.s. with 

drrf J nx Z t (x r )Q^(dx\x(t) = z) 

dQ? [Z >- f nx Z t (x,-)Q»(dx) 

Hence evidently 7rf ~ -k\ a.s. for any /j,, v and t > 0. Now note that for / G C&(§), 



\n,v) ( limsup |7r t (/) -7r t (/)| 



) = E (7r s ,7f s ) ( limSUp \TT t (f) - 7T t (/)| 

t — >oo 



where we have used the Markov property and the fact that the random variable 
limsup^oQ |7rt(/) — Kt(f)\ is invariant under the shift 6 S . But we have just es- 
tablished that 7r t ~ 7ft a.s., and thus the right hand side of this expression van- 
ishes a.s. due to the fact that the filter is already assumed to be stable. Thus 
limsupt^oo \^t(f) — Kt(f)\ — a.s., and the claim is established. □ 

The regularity of the signal process is closely related to the classical notion of 
controllability. Suppose that § is a compact connected C°°-manifold, and that the 
signal process Xt is the solution of the Stratonovich stochastic differential equation 

dX t = F(Xt)dt + G(X t )odWt, X G§, 

where F and G are C°°-vector fields on S. Then Xt is a Markov process as usual 
with transition probabilities pt{x, A). Consider also the associated control system 

^-~t = F(Z t )+G(Zt)u(t), S G§, 
at 

where u(t) is the control input. We denote by A t (x) C § the set of points H t 
which are reachable from So = x by the application of a piecewise smooth control 
signal u, and call the signal controllable if A t (x) = S for every x £ S and t > 0. 
It follows from the Stroock-Varadhan support theorem that controllability is a 
sufficient condition for irreducibility of the signal [22] . Moreover, the controllability 
assumption additionally implies hypoellipticity of the diffusion X t (see the remark 
on [22l page 175]), which gives rise to the strong Feller property. 

Thus evidently a sufficient condition for strong stability of the filter, for a diffu- 
sion signal on a compact manifold with white noise type observations, is that the 
signal is controllable and the filtering model is observable. This mirrors precisely 
the well known controllability-observability criterion for the stability of the Kalman 
filter [SJI^nj. Indeed, it is not difficult to verify that the linear filtering model is 
observable in the sense of this paper precisely when the well known observability 
rank condition is satisfied, while the linear signal is controllable precisely when the 
controllability rank condition is satisfied (though, unfortunately, the Kalman filter 
does not fit into the current setting as its state space is not compact). 

Remark 7.13. Regularity of the signal process is certainly not a necessary condition 
for strong stability. In the finite state setting, for example, regularity occurs only 
when there is a single ergodic class and there are no transient states. We have seen, 
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however, that strong stability still holds true in the presence of transient states. 
The latter situation is analogous to the stabilizability criterion for the stability of 
the Kalman filter [53]. One might hope that also stabilizability and detectability 
have natural counterparts in the general setting, but we will not pursue this here. 

On the other hand, we remark that stabilizability and detectability are generally 
considered together in the stability theory for the Kalman filter, while the results of 
this paper indicate that these conditions play rather separate roles. In particular, 
it is to be expected that detectability is a sufficient condition for the stability of 
the Kalman filter even in the absence of stabilizability, provided that the initial 
distributions are absolutely continuous fi <C v. That this is indeed the case (under 
slightly stronger conditions) is shown in the appendix^ 

Both the Kalman filter and the finite state case give rise to conditions for observ- 
ability and controllability which arc easily computed explictly in terms of matrices. 
For general diffusions, the matter appears to be much more complicated. To estab- 
lish controllability one may employ certain Lie-algebraic computations, as detailed 
in [52]. The question of observability for signals on a non-finite state space does 
not appear to have been studied at all in the literature. 

A slightly stronger condition than observability, however, is closely related to 
the classical observability problem for (deterministic) infinite-dimensional linear 
systems. Suppose that we have white noise type or counting observations, so that 
observability is determined by £f = a{h(X t ) : t > 0}. Rather than require every 
/i ^ v to merely give rise to different P M |&?h ^ P v \cgh, we could ask whether {i^v 
implies that ~P^\ a {h(x t )} ^ ~P w \a{h(x t )} f° r some t > 0. The latter is clearly a 
sufficient condition for observability, where only the one-dimensional distributions 
of the process h(X t ) are taken into account (compare with remark [6. 8p . 

Now denote by Rt the Markov semigroup of the signal process. Then there exists 
a dual semigroup R^, which acts on the space of measures -M(S), such that R^fj, is 
the law of X t under P M . Moreover, let us define the projection map II : A4(§) — > 
M(h(E>)) such that II : \i \— > ^o/r 1 . Then we can consider X t = R* t [i as defining the 
dynamics of an infinite-dimensional linear system with Xq = fi and with infinite- 
dimensional linear observations Y t = IIX t . The classical observability problem 
associated with this infinite-dimensional linear model characterizes precisely when 
/i 7^ v implies that ~P^\ a {h{x t )} 7^ ~P v \a{h(x t )} f° r some t > 0. A detailed treatment 
of observability problems of this type can be found in [31 j . 

8. The non-compact case 

In the preceding sections we have considered exclusively the case where the signal 
state space § is compact, so that C(S) = Cb(§) = Co(S). When § is not compact, 
as in the common setting where S = K™, for example, one could try to extend the 
proofs to show the stability of functions in Cq(§). The latter space of functions 
is the obvious choice from the point of view of our techniques, as Co(S)* = A4(§) 
even when S is only locally compact. However, from a practical point of view the 
stability of functions in Co(S) is too restrictive; indeed, if the signal is transient 
then the filtered estimate for any such function will be stable, but this fact is of 
little interest (as the filtered estimates of functions that vanish at infinity yield no 



2 However, the method used in the appendix to prove stability of the Kalman filter is unrelated 
to the techniques developed in the body of this paper. 
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information on a transient signal as t — ► oo). Instead, one should consider the larger 
class of continuous bounded functions C&(§) or of all continuous functions C(S). 

Unfortunately, the techniques which we have developed in the previous sections 
do not extend directly to this setting. The problem is, of course, that the dual 
of Cb(S) (with respect to the uniform topology) is no longer .M(S) when § is not 
compact; rather, Cb(S)* can be characterized as Ad((3S), where j3S denotes the 
Stone-Cech compactification of S. A direct analog of our observability condition in 
this setting would thus require that no two initial measures give rise to the same 
observation statistics, even when those measures have some mass distribution "at 
infinity" . Though it is perhaps not surprising that the observability "at infinity" 
plays a role in this setting, the space /3S is sufficiently unwieldy that a direct 
extension of this type does not appear to lead to a useful theory. 

In the remainder of this section we discuss two simple extensions of our results 
to the non-compact case. The first approach is inspired by the previous discussion; 
if the signal admits a tractable compactification aS, our previous results can be 
applied. This yields stability of those functions in C&(§) which admit a continuous 
extension to aS. The second approach assumes that the signal process is tight, so 
that the difficulties of a transient signal are avoided. In this case it is no longer 
necessary that O is the uniform closure of O ; using tightness, it is sufficient to 
consider the closure of O with respect to the topology of uniform convergence on 
compact sets. This resolves our problems, as the dual of Cf,(S) endowed with the 
latter topology is .M c (§), the space of compactly supported finite signed measures. 

Remark 8.1. It is only fair to remark that neither of these approaches is particularly 
satisfying. In particular, the natural test case for the theory, the Kalman filter with 
an unstable signal, is not covered by these approaches. (The Kalman filter for stable 
signals is not particularly interesting, as such filters are always stable regardless of 
observability; see, e.g., the result in the appendix). Further work is needed to 
develop an approach that covers unstable signals in a more satisfactory manner. 

8.1. Compactification. We consider a signal-observation model (X t , Y t ) as in sec- 
tion [2] except that § is not assumed to be compact. Let us assume, furthermore, 
that the observations are of the white noise or counting type as in section [5j and 
that the observation function is continuous and bounded h 6 Cb(S). 

Definition 8.2. Let a§ be a compact Polish space, and consider a filtering model 
{Xf, Y t a ) with signal state space aS and observation function h a (the observation 
model is chosen to coincide with that of (X t ,Y t )). Then (X",Y t a ) is a compactifi- 
cation of (X t , Y t ) if there exists a continuous injection tt : § — > aS such that 

(1) h a e C b (aS) and h = h a o tt; 

(2) For any fi £ V(S), the process (ir(X t ), Tr(Y t )) with initial law (Xo,Yq) ~ \x 
has the same law as (X", Y t a ) with initial law (Xq, Y^) ~/jo tt^ 1 . 

The set aS\ir(S) is called the set of points at infinity. 

Denote by C a (S) = {/ £ C 6 (S) : / = f a o tt for some f a g C b (aS)} the set of 
bounded continuous functions on § that admit a continuous extension to aS (note 
that C Q (S) will always be strictly smaller than C&(§), unless a§ = (3§). Then the 
following results follow immediately from our definitions. 



OBSERVABILITY AND NONLINEAR FILTERING 



27 



Proposition 8.3. If the filter for the model (X",Y t a ) is stable, then the filter for 
the model (Xt,Yt) is astable, i.e., implies 

\W{f{X t )\.^)-W{f{X t )\.^)\^Q P^-a.s. forallfeC a (S). 

Corollary 8.4. Observability of the filtering model (X",Y t a ) implies that the filter 
for (Xt,Yt) is astable. In particular, astability is guaranteed if h a is one-to-one. 

We develop further a particular setting in which this result can be exploited. Set 
§ = R", and consider a signal which solves the Ito stochastic differential equation 

dX t = f(X t ) dt + g(X t ) dW t . 

We assume that /, g are continuously differentiable and of sublincar growth: 

f,9£C\ \\f(x)\\<K(l + \\x\\ a ), \\g(x)\\ < K(l + \\x\\ a ), a<l. 

We now consider a compactification which adjoins to R™ a sphere at infinity 5 n_1 ; 
this can be done, for example, by choosing aS to be the closed unit ball {x £ R™ : 
\\x\\ < 1} and setting w(x) = (1 + ||x|| 2 ) _1 / 2 a;. E.g., when n = 1, this reduces to the 
two-point compactification [— oo, oo] of the real line R = ]— oo, oo[, in which case 
C Q (R) is precisely the set of functions / £ C&(R) such that lim x ^± 00 f(x) exist. 

Now choose h £ C a (S), and let Y t be a white noise type or counting observation 
model with observation function h. Then it follows from [241 example 3] that 
there is a compactification (X",Y t a ) of (X t ,Y t ) with the additional property that 
if Xq £ aS\7r(S) a.s., then X" = Xq for all t > a.s. (i.e., the points at infinity 
are fixed points for the compactified signal X"). We can exploit the latter to give 
a criterion for astability in terms of properties of the non-compactified model. 

Proposition 8.5. Suppose that the following conditions hold: 

(1) (X t ,Y t ) is observable (no two initial measures give rise to the same law of 
the observation process); 

(2) The restriction of h a to aS\7r(S) is one-to-one; 

(3) X t does not possess an invariant manifold that is contained in some level 
set {x £ S : h(x) = u} with u £ {h a (x) : x £ a§\7r(§)}. 

Then the filter for (X t ,Y t ) is astable. 

Remark 8.6. For condition (3) to be satisfied, it is sufficient to establish that we 
have {h a {x) : x £ a§\7r(§)} n {h(x) : x £ §} = 0. A sufficient condition for (l)-(3) 
to be satisfied is that h a is one-to-one. 

Proof. Let [i, v be two measures on aS with fi ^ v. To prove the proposition, it 
suffices to establish that the law of h a (X") is different for Xq ~ fi and Xq ~ v. 

First, suppose that fi, v are supported on a§\7r(§). Recall that if Xq £ aS\7r(S), 
then X" = Xq for all t > 0. As h a is one-to-one on a§\7r(S), this implies the claim. 

Next, suppose that /i is supported on 7r(§), while v is supported on aS\7r(S). 
If Xq ~ /j, and Xq ~ v were to give rise to the same law for h a (X"), then 
h a (X") would have to be an a.s. constant process under both measures, so that in 
particular fi must be supported on the union of the invariant manifolds of X t which 
are contained in level sets of h. But by the third assumption the process h a (X") 
can then not take the same values under /x and v, so that we have a contradiction. 

Now let fi and v be arbitrary. Then we can write [i = a/ii + (1 — a)[i2 and 
v = bv\ + (I — b)v2, where a, b £ [0, 1], fMi, v\ are supported on 7r(S), and [12, V2 are 
supported on aS\7r(S). Note that by our assumptions, the probability that h a (X") 
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is a constant process which takes values in {h a (x) : x G aS\7r(§)} is precisely 1 — a 
when Xq ~ (i and 1 — b when Xff ~ v. Hence if [i, v give rise to the same observation 
law, then a = b and (12 = ^2 (the latter follows as h a is assumed to be one-to-one 
on aS\7r(S)). But then we can conclude that /1, v give rise to the same observation 
law only if fix and v\ give rise to the same observation law, and the latter implies 
/ii = v\ by our first assumption. Hence the proof is complete. □ 

Remark 8.7. It is likely that the compactification approach described in this section 
can be generalized to a larger class of signals. However, the requirement that 
h G C a (E>) appears to rule out any model in which the observation function is 
unbounded. In practice, on the other hand, the unbounded observation case is 
much more natural when S is non-compact, and one would even expect stability to 
improve in this setting. The restriction to bounded observation functions is thus a 
significant drawback of the compactification approach. In particular, this rules out 
the application of this approach to the Kalman filter. 

8.2. Tight signal. In this section, we consider a signal-observation model (X t , Y t ) 
as in section^ except that § is assumed to be only locally compact. Throughout this 
subsection, the space of bounded continuous functions C{,(§) will always be endowed 
with the topology of uniform convergence on compact sets. This is a locally convex 
topology, and gives rise to the duality Ct,(§)* = .M C (S) QUI proposition IV. 4.1]. We 
note that lemma [3751 extends also to this setting: 

Lemma 8.8. Let M C Ch(§) be a linear subspace. Then (M ± ) ± = M, where M 
is the closure of M in the topology of uniform convergence on compact sets. 

This can be proved in the same way as lemma 13751 or follows as a special case of 
[TOl theorem V.1.8]. In this setting, we will define 

W= {afii - afx 2 G M C {S) : a G K, fjn,fjt 2 G 7> C (S), //1 - /i 2 }, 

O = {/ G C 6 (S) : Mi(/) = M2(/) for all Ml - fi 2 with Ml)M2 G V C (S)}. 

That is, we will consider bounded functions (which do not necessarily vanish at 
infinity), but we need only consider measures which are compactly supported. We 
now obtain the following analog of proposition 13.61 

Proposition 8.9. Let O be the linear span of functions of the form 

E(»,„)(/i(^i - Y )h{Y t2 - Y ) ■ ■ ■ f n (Y tn - Y )), 

for all n < 00, ti G D and fa G C&(©), where D is a dense subset of [0, oo[. Then 
O is dense in O in the topology of uniform convergence on compact sets. 

The proof is identical to that of proposition 13. 61 so we do not repeat it. We are 
now in the position to prove a stability theorem, similar to theorem 14. 4| provided 
we assume that the signal is tight. The notion of stability is also somewhat weaker 
than that of theorem 14. 4| as a.s. convergence is replaced by convergence in L . 

Theorem 8.10. Let /1 < 1/, and assume that the signal X t is tight in the sense 
that for every e > 0, there is a compact set K E C S such that ~P u (X t G K £ ) > 1 — e 
for all t > 0. Then E^(|E^(/(X t )|^ t y ) - W{f{X t )\^)\) for any f G O. 

Proof. For / G O , the result follows directly from lemma [4711 Now choose e, 5 > 0. 
By the tightness assumption and [32l page 146(b)], we can choose a compact set 
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K such that P»(X t G K) > 1 - e and P v (X t G K) > 1 - e for all t > 0. We also 
choose /s G 0° be such that sup x€K \f(x) — fs{x)\ < $■ Then 

K(f) - < kf (/) - Trf (//*)| + kr(/fe) - tt^/^)! 

+ |<(/^) - <{fi K )\ + \<{fi K ) <(/)l, 

where we have written 7if (/) = E /i (/(X t )|^ t y ). Note that 

E"(K (/) - )|) < ll/H P"(* t 6 < ell/H, 
E"(|7tf (/,/*) - < ||M| P"(X t G A-) < e ||/ 4 ||, 

while 

E"(K(/j Jf ) - Tif < <*, E"(|<(/ 5 i x ) - < 

Now fix 7 > 0, and note that 

E"(K(/I*) -<(/)|)< ]|/]| E^(X )<(^)) 

< ll/H E"^(X )/ dM/dv(Xo)>7 ) + 7||/|| P"(X t £K C ) 

< H/ll E^(Xo)I dM/w > 7 )+e 7 ||/||, 

and similarly 

W(W(f s )-it(f s I K )\) < \\M\V v (^(X )I d „ /MXo)> ^ +n\\f s \\. 
It follows that 

limsupE^KCf) -<(/)]) 

< 26 + (s + e 7 + W (^(X ) W, M x o) >Xj (11/11 + HMD- 

But <5, e, 7 > were arbitrary, so the result follows by letting S, e — > 0, 7 — > 00. □ 

Remark 8.11. If we consider all continuous functions C(S), rather than bounded 
continuous functions C&(8), then it is still true that C(S)* = .M c (§) when C(S) 
is topologized by uniform convergence on compact sets. Thus, in fact, we may 
consider the set of continuous observable functions 

O = {/ G C(S) : Mi(/) = M/) for all Ml - M2 with Ml , M2 G P C (S)}, 

and it is still the case that O (which contains only bounded continuous functions) 
is dense in O. This may be exploited, under suitable additional restrictions on the 
signal process, to prove stability of unbounded observable functions that satisfy an 
appropriate growth condition. For example, if § = R™, is uniformly inte- 

grable and ||d/x/di/|| g < 00, then the proof of the previous theorem is easily modified 
to show that observable functions of polynomial growth of degree k (depending on 
p, q) are stable. See [H proposition 3.3] for a similar argument. 
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Appendix A. On stability of the Kalman filter 

Wc have shown, for a reasonably general class of nonlinear filters, that observ- 
ability implies filter stability provided that we choose absolutely continuous initial 
measures fi <C v. A similar result for linear filtering models is well known, and 
the stability of the Kalman filter has been studied already for several decades (see, 
e.g., [6]; a more recent account can be found in [26]). These results, however, do 
not tend to assume that while on the other hand stabilizability is typically 

required in addition to detectability. The goal of this appendix is to illustrate that 
also in the linear setting, the stabilizability condition can be disposed of if we are 
willing to impose an absolute continuity requirement on the initial conditions. This 
highlights the separate roles of stabilizability and detectability in this setting. 

Remark A.l. We will make no attempt at generality and prove only the simplest 
possible result, for the purpose of illustration, by applying readily available results 
from the literature. Despite that the conclusion is hardly surprising and that the 
proof is straightforward, the author could not find any such result in the literature. 

We consider the following linear signal-observation model: 

dX t = AX t dt + BdW t , 
dY t = CX t dt + dB t , 

where A, B, and C are n x n, n x m, and p x n matrices, respectively, and Xq is 
Gaussian with mean Xq and covariance matrix Pq. As is well known, the filtered 
estimate X t = E(X t \,^) and covariance P t = E((X t - X t )(X t - Xt)*) satisfy 

dX t = AX t dt + P t C*(dY t - CX t dt), 
dP 

— - = AP t + P t A* + BB* - P t C*CP t . 
dt 

The first equation is the Kalman filtering equation, while the second is the Ric- 
cati equation. We would like to compare the solution of these equations with the 
solutions X' t , Pi of the same equations with incorrect initial conditions X' 0l Pq. 

Proposition A. 2. Suppose (A, C) is detectable and Pq > 0, Pq > 0. Then there 
is a Pqo such that P tl P[ — ► P^ as t — ► oo, and ~Ei\\X t — X' t \\ —t0 as t — > oo. 

Remark A. 3. It is known that under the conditions of the proposition the solution 
of the Riccati equation converges to a unique limit Poo . There is a key difference 
with the stabilizable case, however: in the current setting, the matrix A — P oc C*C 
could be singular, while in the stabilizable case the matrix A—PoqC*C is guaranteed 
to be strictly negative. The current situation is thus more subtle, and the proof of 
|26| does not immediately extend to this setting. 

Proof. The existence of the (nonnegative definite) matrix and the convergence 
of Pt,P/ is established in [11] (see also [27] for more recent results). It remains to 
establish the second part of the proposition. To this end, recall that the innovation 
dB t = dY t — CX t dt is a Wiener process. We begin by writing 

d(X t - X' t ) = (A- P;C*C)(X t - X' t ) dt + (P t - P()C* dB t . 

Now recall that the fact that (A, C) is detectable implies that there exists a matrix 
K such that A — KC only has eigenvalues with strictly negative real parts. Fix any 
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such K, define F = KC — A, and note that we can write 

X T - X' T = e- FJ \X - X' Q ) + 

[ T e-^-'^K - P;C*)C{X t - X' t ) dt + j T e- F <- T -Q(P t - P t ')C* dB t . 
Jo Jo 

We claim that each of these terms converges to zero in L 1 . Let us consider each 
term individually. The first term clearly converges to zero, as the eigenvalues of F 
have strictly positive real parts. For the second term, note that there are constants 
ci, A > such that \\e^ F< ~ T ^\\ < c\ e~ A ( T_t ', and that as P[ -» it must be the 
case that ||P/|| is bounded from above by some constant C2. We can thus estimate 



-F(T-t) 



(K - P;C*)C(X t - X' t )dt 



<c 3 



-A(T-t) 



\\C(X t -X' t )\\dt, 



where we have lumped the various constants into C3 > 0. But by [21 theorem 3.1] 



E 



||C(! t -! t ')ll 2 * 



< 00 



(E\\C(X t -X^\\) 2 dt<oo. 



Hence the second term converges to zero in L 1 as T — >■ 00 by lemma IA.4I below. It 
remains to deal with the third term. To this end, note that 



E 



-F(T-t) 



(P t - Pi)C* dB t 



r ^-^-^{Pt- pi)c*\w 2 dt, 

Jo 



where ||| • ||| is the Frobenius norm. It is easily seen, again using lemma [A . 41 below . 
that this expression converges to zero as T — > 00. Hence the stochastic integral 
converges to zero in L 2 , and thus also in L l . The proof is complete. □ 

The following simple lemma was used in the proof. 
Lemma A. 4. Let A > and f : [0, 00 [ — » E. Then 



f(tfdt< 



>- x{T - t] f{t) dt^=^0. 



The result also holds if we require f(t) — > instead of the L 2 bound. 
Proof. Note that for any s > 0, we can estimate 

r-T 



lim sup 



e -KT-t) f{ t )dt 



< limsup 



,-HT-t) 



f{t) dt 



lim sup 



e -MT-t) f{t)dt 



Clearly the first term on the right is zero. For the second term, note that 



^f(t)dt 



< 


ll 







e -2X(T-t) dt /(t) 2 dt 



1/2 



:S2 



RAMON VAN HANDEL 



by Cauchy-Schwarz. The result follows by letting T —* oo, then s — » oo. To prove 
that the result also holds if f(t) — > 0, it suffices to repeat the proof with 

Fe-w-ywdt 

J s 

where it should be noted that f(t) — > implies limsup^^ \f(t)\ =0. □ 



< 



sup|/(i)| 

t>s 



e -*(T-t) dt. 
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